In this lab we are going to work on how to estimate the background from 'real' data. Real is in air quotes because the data is actually from simplified simulations to make the problems manageable in a single lab. But the data will have some features that resemble that of real data sets.
In general exchanging raw data is a pain. Writing data in text files is error prone, inaccurate, and wastes space, but raw binary files have all sorts of subtleties too (including the internal byte order of your processor). To try and sidestep a whole set of nasty issues, we are going to use HDF5 (originally developed by the National Center for Supercomputing Applications for data exchange) to allow everyone to import the data.
If you are working on your own machine, in either Matlab or python, follow the links in the assignment to download the files put them in your working directory.
If you are using python on your own machine, you will need to install the h5py library. Go to your Anaconda environment (if you followed the suggestions in the first lab) and search for the h5py library. Install that library into your environment, then restart your jupyter Lab or notebook session.
If you are working in the cloud python the necessary library is already installed, but you need to follow a magic incantation to download the file from the course website to your cloud instance. From the website find and copy the link address to the data file (often this means right or option-clicking on the link). Then open the terminal in your cloud instance, navigate to your working directory, and use the following command structure
So my command (your link might be different) for the first file is
Which I can see with ls -lh has downloaded a 792 MB file to my directory named gammaray_lab4.h5
Now start your lab with an import block that looks like:
%matplotlib inline
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import scipy
from scipy import stats
import h5py
import pandas as pd
#this sets the size of the plot to something useful
plt.rcParams["figure.figsize"] = (20,15)
Now import the file for problem 1 using
hf = h5py.File('gammaray_lab4.h5', 'r')
Within hdf5 files you can store different kinds of data sets. Ours is simple and has one called 'data'. You can look at the header and see this using
hf.keys()
<KeysViewHDF5 ['data']>
You can then import the data into an array variable using the get method
data = np.array(hf.get('data'))
Check that your data is as expected, with a time (in gps seconds), Solar phase (deg), Earth longitude (deg), and gamma-ray counts, and more than 25 million data records. Printing the first row you should get:
data[:,0]
array([9.40680016e+08, 3.15000000e+02, 4.50000000e+01, 1.00000000e+01])
You can then close your file
hf.close()
In this problem we are looking at the data from a gamma-ray satellite orbiting in low Earth orbit. It takes a reading of the number of particles detected every 100 milliseconds, and is in an approximately 90 minute orbit. While it is looking for gamma-ray bursts, virtually all of the particles detected are background cosmic rays.
As with most data, there are 'features.' The meta-data is incorporated into the data file we imported above.
The data has 4 columns and more than 25 million rows. The columns are time (in gps seconds), Solar phase (deg) showing the position of the sun relative to the orbit, Earth longitude (deg) giving the position of the spacecraft relative to the ground, and particle counts.
In order to get to know our data better, we will be plotting it and its metadata below to get to know it better.
Through a scavenger hunt of trial and error we will be looking for patterns in the data so that we can infer the properties of the data.
I will be using pandas to create a dataframe for my problems.
def plot_line_step(X, Y, start, end, step):
fsize = 30
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.plot(X[start:end:step], Y[start:end:step], label = 'Data', linewidth = 3)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel(f'{Y.name}', fontsize = fsize)
ax.set_title(f'{Y.name} vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.legend()
plt.show()
def plot_points_step(X, Y, start, end, step):
fsize = 30
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.scatter(X[start:end:step], Y[start:end:step], label = 'Data points', linewidth = 3)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel(f'{Y.name}', fontsize = fsize)
ax.set_title(f'{Y.name} vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.legend()
plt.show()
def plot_density(t, X, start, end, step, bins):
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
h = ax.hist2d(t[start:end:step], X[start:end:step], bins, linewidth = 3, cmap='plasma')
ax.set_xlabel(f'{t.name}', fontsize = fsize)
ax.set_ylabel(f'{X.name}', fontsize = fsize)
ax.set_title('Folded Data (2D Histogram)', fontsize = fsize, fontweight = 'bold')
ax.set_ylim([0, 20])
fig.colorbar(h[3], ax=ax)
plt.show()
def plot_fold_density(t, X, bins, fold):
t_mod = t % fold
# x = np.arange(start, end, step)
# y = np.mod(X[start:end:step], fold)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
h = ax.hist2d(t_mod, X, bins, linewidth = 3, cmap='plasma')
ax.set_xlabel(f'{t.name}', fontsize = fsize)
ax.set_ylabel(f'{X.name}', fontsize = fsize)
ax.set_title('Folded Data (2D Histogram)', fontsize = fsize, fontweight = 'bold')
fig.colorbar(h[3], ax=ax)
# ax.set_ylim([0, 2*np.pi])
plt.show()
We use pandas to simplify our data manipulation process.
df = pd.DataFrame(data).T
df.columns = ['GPS Time (s)', 'Solar phase (deg)', 'Earth longitude (deg)', 'Particle counts']
We first look at how the different data columns vary with GPS Time.
plot_points_step(df['GPS Time (s)'], df['Particle counts'], 0, df['GPS Time (s)'].size, 100_000)
plot_points_step(df['GPS Time (s)'], df['Solar phase (deg)'], 0, df['GPS Time (s)'].size, 100_000)
plot_points_step(df['GPS Time (s)'], df['Earth longitude (deg)'], 0, df['GPS Time (s)'].size, 100_000)
Then we can look at how the particle counts vary for the different data columns.
plot_points_step(df['Earth longitude (deg)'], df['Particle counts'], 0, df['Earth longitude (deg)'].size, 10_000)
plot_points_step(df['Solar phase (deg)'], df['Particle counts'], 0, df['Solar phase (deg)'].size, 10_000)
Let's find out if this data follows a mathematical distribution.
X = df['Particle counts']
fsize = 30
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(X[0:100_000], bins = 300, label = 'Data', linewidth = 3, density = True)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel('Probability Mass', fontsize = fsize)
ax.set_title(f'Probability Mass vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.legend()
plt.show()
The background seems to follow a Poisson distribution. Thus, as we plan how we may create a $pdf()$ for our background distribution we take note that the background seems to follow this - a Poisson distribution. We confirm this below by plotting a Poisson distribution on top of the data.
mean1 = 6
x = np.linspace(0,25,1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(X[0:100_000], bins = 300, label = 'Data', linewidth = 3, density = True)
ax.plot(x, stats.poisson.pmf(mean1, x)*11, linewidth = 3)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel('Probability Mass', fontsize = fsize)
ax.set_title(f'Probability Mass vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.legend()
plt.show()
mean1 = 6
x = np.linspace(0,25,1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(X[0:100_000], bins = 300, label = 'Data', linewidth = 3, density = True)
ax.plot(x, stats.poisson.pmf(mean1, x)*11, label = 'poisson', linewidth = 3)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel('Probability Mass', fontsize = fsize)
ax.set_title(f'Probability Mass vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.legend()
plt.show()
However, we see that when we plot this distribution with a Poisson (of our own trial and error) that it is not fully represented by a Poisson distribution. We will keep this in mind in future considerations.
The background is not consistent across the dataset, so we will find and describe as accurately as we can how the background changes.
When we plot the particle counts against the GPS Time, it is almost as if the background varies somehow periodically as we look it it closer.
plot_line_step(df['GPS Time (s)'], df['Particle counts'], 0, df['GPS Time (s)'].size, 10_000)
There is a faint periodic signal here, so let's try to fold the data - by folding the data, we plot it over a certain period such that every time we cross that interval we "restart" plotting from the beginning of the x-axis resulting in a plot showing the density signal not per time, but per fraction (of some sorts) of the period. We see if the data varies by the period that the satellite encircles the Earth (its 'Earth Longitude').
t = df['Earth longitude (deg)']
X = df['Particle counts']
fold = 90
bins = 30
start = 0
end = X.size
step = 1
plot_density(t, X, start, end, step, bins)
It is clear that the period varies with Earth Longitude. Since the background follows a Poisson distribution, we see that the mean (and peak - brigthest yellow in 2D hist plot) of the distribution varies as the satellite orbits the Earth. This pattern is repeated every orbital period (360 degrees Earth Longitude).
Moreover, the mean of this Poisson distribution seems to vary with exponential decay.
Thus, we have a Poisson distribution with a shift dependent on time. Then, we will find the mean of each bin vertically - and we need to know how the mean changes over time. Use longitude (which is dependent on time)
Create a model for the background that includes time dependence, and explicitly compare your model to the data. How good is your model of the background?
Now, we attempt to create a model for the background. Let's try by using the information we found above.
start,end,step = (0,300000,1)
X = df['GPS Time (s)']
Y = df['Earth longitude (deg)']
fsize = 30
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.plot(X[start:end:step], Y[start:end:step], label = 'Data points', linewidth = 3)
ax.set_xlabel(f'{X.name}', fontsize = fsize)
ax.set_ylabel(f'{Y.name}', fontsize = fsize)
ax.set_title(f'{Y.name} vs. {X.name}', fontsize = fsize, fontweight = 'bold')
ax.legend()
plt.show()
As we can see, the Earth Longitude varies periodically with a set time interval in GPS Time. Let's find the period in GPS Time by looking at when the Earth Longitude passes the 0 Meridian.
meridian = 0
df[df['Earth longitude (deg)']==meridian][:10].index
Int64Index([47250, 101250, 155250, 209250, 263250, 317250, 371250, 425250,
479250, 533250],
dtype='int64')
a = df[df['Earth longitude (deg)']==meridian][:10].index[0]
b = df[df['Earth longitude (deg)']==meridian][:10].index[1]
a,b
(47250, 101250)
orbit_interval = df['GPS Time (s)'][b] - df['GPS Time (s)'][a]
orbit_interval
5400.0
print(f'Thus our time interval the satellite spends during one Earth orbit is {orbit_interval} s.')
Thus our time interval the satellite spends during one Earth orbit is 5400.0 s.
mean = (df['GPS Time (s)'] % 5400)
poisson = stats.poisson(mu=mean)
mean[:109000]
0 16.0
1 16.1
2 16.2
3 16.3
4 16.4
...
108995 115.5
108996 115.6
108997 115.7
108998 115.8
108999 115.9
Name: GPS Time (s), Length: 109000, dtype: float64
t = ((df['GPS Time (s)']-16) % 5400)
t
0 0.0
1 0.1
2 0.2
3 0.3
4 0.4
...
25919996 5399.6
25919997 5399.7
25919998 5399.8
25919999 5399.9
25920000 0.0
Name: GPS Time (s), Length: 25920001, dtype: float64
Seems like the mean varies between around a Particle Count of 11 and 6 over a time interval of 5400 s (shifted by about 16 s).
Our distribution needs to meet the above "boundary conditions".
$$ 11 \cdot p^t = a,b $$$$ 11 \cdot p^0 = 11 $$$$ 11 \cdot p^{5400} = 6 $$p = np.exp(np.log(6/11)/5400)
p
0.9998877589284689
t1 = 0
t2 = 5400
11*(p**(t1)), 11*(p**(t2))
(11.0, 6.000000000000629)
Thus, our background pdf is
t = ((df['GPS Time (s)']-16) % 5400)
mean_array = 11*(p**(t))
x = np.linspace(0,5400,10000)
background = stats.poisson(mu=mean_array)
Because the background varies, your discovery sensitivity threshold (how many particles you would need to see) also varies. What is the '5-sigma' threshold for a 100 millisecond GRB at different times?
Optional: while this is simulated data, it is based on a real effect seen by low Earth orbit satellites. Can you identify the cause of the variable background and propose a physical model?
Since our background varies, so will the discovery threshold. The 5 sigma threshold (for for example a 100 millisecond GRB) at different times is:
sigma = 5
prob = stats.norm.sf(sigma)
prob
2.866515718791933e-07
At time $t=0$:
time1 = 0
threshold1 = stats.poisson.isf(prob, mu=mean[time1])
threshold1
40.0
At time $t=5000$:
time2 = 5000
threshold2 = stats.poisson.isf(prob, mu=mean[time2])
threshold2
634.0
print(f'So at time={time1}s the 5-sigma threshold is {threshold1} cosmic rays and at time={time2}s the threshold is {threshold2} cosmic rays.')
So at time=0s the 5-sigma threshold is 40.0 cosmic rays and at time=5000s the threshold is 634.0 cosmic rays.
This varying background may be due to the satellite being exposed periodically to the sun or the moon as it orbits the Earth.
In this problem we are going to look at a stack of telescope images (again simulated). We have 10 images, each with transients (something like a super nova that only appears in one image) and faint stars. We will be looking for transients.
We dowload the data from images.h5. This is a stack of 10 square images, each 200 pixels on a side.
hf = h5py.File('images.h5', 'r')
hf.keys()
<KeysViewHDF5 ['image1', 'imagestack']>
image1 = np.array(hf.get('image1'))
imagestack = np.array(hf.get('imagestack'))
hf['image1'], hf['imagestack']
(<HDF5 dataset "image1": shape (200, 200), type "<f8">, <HDF5 dataset "imagestack": shape (200, 200, 10), type "<f8">)
image1[:,0]
array([-1.24711126, 0.09727501, 0.33900351, 0.29558254, 1.91368599,
0.07626013, 0.03986838, -0.25576938, -0.22294167, -0.65113439,
-0.01398361, 0.70831161, -0.09705982, -0.30476611, -0.47555002,
0.29868733, -0.35539496, -0.12378233, -0.40464966, 0.07679724,
-0.25425321, 0.03919398, -0.54560791, -0.26625077, 0.07074571,
-0.13175857, -0.18360689, -0.14991885, -0.29985518, 1.37532486,
-0.19475698, -0.36960906, 0.48550722, -0.67534756, -0.64634097,
0.88276573, -0.16895827, -0.42872912, -0.02869665, 0.43524321,
0.24346419, 1.10485685, -0.33400717, 0.51251083, 0.90152251,
0.92508949, -0.83172978, 0.45036765, 1.16175577, -0.80162552,
-0.36378374, 0.48153328, -0.32109448, 0.14591758, 0.77390384,
0.11669285, 0.5746037 , 0.27299184, 0.4348156 , -0.15030213,
0.05540169, 0.38094349, 0.6311626 , 0.19981589, 0.63349839,
0.43168284, 0.09893955, -0.85494294, 0.93242997, 0.37957992,
-0.08278824, 0.80682018, 0.15010843, -0.95922717, -1.26249655,
-0.94637467, -0.15352023, -0.07384016, -0.41803639, 0.4982668 ,
0.66809981, 0.72741214, -0.18094812, 0.0972217 , -0.2418176 ,
-1.06333105, -0.03102494, -0.19620065, 0.25379191, -0.33746636,
-0.27249717, 0.0701126 , 0.60899958, -0.73714658, -0.4902708 ,
-0.22620815, 0.45333986, 0.19660499, -0.80944926, 0.19988036,
-1.1648606 , -0.16066758, -0.45766878, -0.02665867, 0.46653855,
0.2220366 , -0.60288937, -0.52148233, -0.29863143, -0.27530732,
-0.5097406 , -0.06689021, 0.26683793, -0.09194396, 0.68719753,
0.2019346 , 0.74079708, 0.00796042, 0.55804657, 0.20961482,
0.17362284, -0.32905264, 0.21144649, 0.19887555, 0.50027729,
0.41176336, 0.2161396 , 0.7559562 , 0.10839095, -0.54405494,
0.67476506, 0.33172359, 0.58006441, 0.10053878, 0.09450886,
0.24487801, -0.96764838, 0.02257392, -0.05627404, -1.09493906,
-0.30909209, 0.31138105, -0.13693753, 1.22042577, 0.35058191,
0.26441676, 0.95097704, 0.22796594, -0.42044 , 0.79805597,
0.65795808, -0.53318533, -0.55597433, 0.07928304, -0.09236736,
-1.62689272, -0.02254982, 0.37533826, 0.46281651, 0.4968663 ,
-0.61484966, -0.48875166, 0.04894066, 0.39827377, -0.10714246,
-0.76350925, -0.37130876, -1.56810692, -0.10983415, -0.47270383,
-0.49466462, -0.77631473, 0.42848326, 0.95483591, 0.25712471,
-0.05448012, -0.10635216, -0.11081593, -0.00479448, -0.40168115,
-0.00845614, 0.9981043 , -0.03942449, 0.35160042, -0.29764412,
-0.84577687, -1.21221407, 0.25012219, -0.92180976, 0.68394382,
-0.13802771, 0.52752293, 0.28477364, -0.04521836, 0.4203928 ,
-0.03778918, -0.21000001, 0.5566705 , -0.32054855, 1.16270664])
You can then close your file
hf.close()
Explore the data. Is there signal contamination? Is the background time dependent? Is it consistent spatially? Develop a plan to calculate your background pdf().
We then explore the data to see if there is signal contamination, if the background is time-dependent or if it is constistent spatially.
We will eventually calculate a background $pdf()$.
print(image1.shape)
(200, 200)
plt.imshow(image1, cmap='Greys')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f9ce898e280>
We can access for example image#5 in the imagestack like this:
plt.rcParams["figure.figsize"] = (6,6)
for imagenumber in range(0,10):
plt.imshow(imagestack[:,:,imagenumber], cmap='Greys')
plt.colorbar()
plt.show()
There is most definitely a source of background noise, the faint stars that appear as faint gray dots in the background of the darker black dots.
plt.plot(image1);
for imagenumber in range(0,10):
plt.plot(imagestack[imagenumber]);
plt.show()
Again, here we easily see that some of the images have clear signal detections (transients) in the form of spikes while others have more background noise. So what does our noise profile look like?
We're looking at intensity vs. counts of pixels with that intensity $\rightarrow$ Essentially the distribution of intensity over the CCD.
flat_list = []
for sublist in image1:
for item in sublist:
flat_list.append(item)
min(flat_list), max(flat_list)
(-2.3061570283216652, 46.84317561571535)
bins = np.arange(np.floor(min(flat_list)), np.ceil(max(flat_list)+1))
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
plt.show()
Maybe this hints of a Gaussian distribution?
background = stats.norm(loc=0, scale=0.6)
x = np.linspace(-3, 47, 1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*4e4, linewidth = 3, label = 'gaussian')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
background = stats.rayleigh(loc=-3, scale=2)
x = np.linspace(-3, 47, 1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*6e4, linewidth = 3, label = 'rayleigh')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
background = stats.expon(loc=0, scale=1)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*6e4, linewidth = 3, label='exponential')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
gaussian = stats.norm(loc=0, scale=0.6)
exponential = stats.expon(loc=0, scale=7)
x = np.linspace(-3, 47, 1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, gaussian.pdf(x)*4e4 + exponential.pdf(x)*1e2, linewidth = 3, label = 'gaussian')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
From above, it seems like the Gaussian distribution is our best bet.
bins
array([-3., -2., -1., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.,
10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35.,
36., 37., 38., 39., 40., 41., 42., 43., 44., 45., 46., 47.])
t = np.linspace(0, 200, 200)
bins = 30
fold = 10
t_mod = t % fold
X = np.zeros(image1[0].size)
for imagenumber in range(0,10):
X += image1[imagenumber]
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist2d(t, X, bins, linewidth = 3)
# ax.set_xlabel(f'{t.name}', fontsize = fsize)
# ax.set_ylabel(f'{X.name}', fontsize = fsize)
ax.set_title('Folded Data', fontsize = fsize, fontweight = 'bold')
plt.show()
Although it seems like finding the exact background $pdf()$ may be hard, we can but down a threshold and demand that any pixel value needs to
We try again for the imagestack.
flat_list = []
for sublist in imagestack[0]:
for item in sublist:
flat_list.append(item)
min(flat_list), max(flat_list)
(-1.6994599247013218, 1.9955820607564105)
bins = np.linspace(np.floor(min(flat_list)), np.ceil(max(flat_list)+1), 40)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
plt.show()
background = stats.norm(loc=0, scale=0.5)
x = np.linspace(-2, 2, 1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*2.7e2, linewidth = 3, label = 'gaussian')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
flat_list = []
for sublist in imagestack[1]:
for item in sublist:
flat_list.append(item)
min(flat_list), max(flat_list)
(-2.6496072842241567, 19.617172182653647)
bins = np.linspace(np.floor(min(flat_list)), np.ceil(max(flat_list)+1), 40)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
plt.show()
background = stats.norm(loc=0, scale=0.5)
x = np.linspace(-2, 2, 1000)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*2.7e2, linewidth = 3, label = 'gaussian')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('CCD intensity vs. number of pixels', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
The above plots were included to show my thought process in how I arrived at interpreting the background pdf().
Let's try to add all the intensities of the images in the imagestack and divide by the number of images.
flat_list = []
for i in range(0,10):
for sublist in imagestack[i]:
for item in sublist:
flat_list.append(item)
min(flat_list), max(flat_list)
(-2.6496072842241567, 19.617172182653647)
bins = np.linspace(np.floor(min(flat_list)), np.ceil(max(flat_list)+1), 40)
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('All images in image stack', fontsize = fsize, fontweight = 'bold')
# ax.set_yscale('log')
plt.show()
background = stats.norm(loc=0, scale=0.7)
x = np.linspace(-5, 5, 1000)
a = 1.5e4
fig, ax = plt.subplots(1, 1)
plt.tick_params(labelsize = 20)
ax.hist(flat_list, bins, linewidth = 3)
ax.plot(x, background.pdf(x)*a, linewidth = 3, label = 'gaussian')
ax.set_xlabel(f'Intensity', fontsize = fsize)
ax.set_ylabel(f'Number of pixels', fontsize = fsize)
ax.set_title('All images in image stack', fontsize = fsize, fontweight = 'bold')
ax.set_yscale('log')
ax.set_ylim([1e-1, 3e4])
ax.legend()
plt.show()
This background:
background = stats.norm.pdf(x, loc=0, scale=0.7)*1.5e4
Seems to be a good fit for the data.
Using your background distribution, hunt for your signal (either faint stars, or a transient). Describe what you find.
So let's use this background distribution to hunt for a signal (a transient). We will look for a candidate with one of the higher intensities in the plot - one with an intensity value of ~13.
Y = 13
# prob_dist = stats.norm.sf(Y, loc = 0, scale = X) * N_pixels
# sigma = stats.norm.ppf(1-prob_dist / N_pixels)
prob = background.sf(Y) * a
sigma = abs(stats.norm.ppf(prob / a))
prob, sigma
(4.1045501696095266e-73, 18.57142857142857)
print(f'Thus, the sigma of a signal Y = {Y} is {sigma:.1f}.')
Thus, the sigma of a signal Y = 13 is 18.6.
My lab partner (who chose to look for faint stars instead of transients) had different pdf()s, but were using the same data.
This is likely because we chose to look for different detections in our data. While when looking for transients, I look for a strong excess in signals, one would when looking for faint stars look for a much weaker excess of signals.